Incremental Truncated LSTD

نویسندگان

Clement Gehring

Yangchen Pan

Martha White

چکیده

Balancing between computational efficiency and sample efficiency is an important goal in reinforcement learning. Temporal difference (TD) learning algorithms stochastically update the value function, with a linear time complexity in the number of features, whereas least-squares temporal difference (LSTD) algorithms are sample efficient but can be quadratic in the number of features. In this work, we develop an efficient incremental lowrank LSTD( ) algorithm that progresses towards the goal of better balancing computation and sample efficiency. The algorithm reduces the computation and storage complexity to the number of features times the chosen rank parameter while summarizing past samples efficiently to nearly obtain the sample efficiency of LSTD. We derive a simulation bound on the solution given by truncated low-rank approximation, illustrating a biasvariance trade-off dependent on the choice of rank. We demonstrate that the algorithm effectively balances computational complexity and sample efficiency for policy evaluation in a benchmark task and a high-dimensional energy allocation domain.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Finite element simulation of two-point incremental forming of free-form parts

Two-point incremental forming method is considered a modern technique for manufacturing shell parts. The presence of bottom punch during the process makes this technique far more complex than its conventional counterpart i.e. single-point incremental forming method. Thus, the numerical simulation of this method is an essential task, which leads to the reduction of trial/error costs, predicts th...

متن کامل

Incremental Least-Squares Temporal Difference Learning

Approximate policy evaluation with linear function approximation is a commonly arising problem in reinforcement learning, usually solved using temporal difference (TD) algorithms. In this paper we introduce a new variant of linear TD learning, called incremental least-squares TD learning, or iLSTD. This method is more data efficient than conventional TD algorithms such as TD(0) and is more comp...

متن کامل

iLSTD: Eligibility Traces and Convergence Analysis

We present new theoretical and empirical results with the iLSTD algorithm for policy evaluation in reinforcement learning with linear function approximation. iLSTD is an incremental method for achieving results similar to LSTD, the dataefficient, least-squares version of temporal difference learning, without incurring the full cost of the LSTD computation. LSTD is O(n), where n is the number of...

متن کامل

Two point incremental forming of a complicated shape with negative and positive dies

In this work, incremental sheet forming of a complicated shape is investigated experimentally. Two point incremental forming with negative and positive dies are employed for manufacturing of a complicated shape with positive and negative truncated cones. The material is aluminum alloy 3105 with a thickness of 1 mm. The effects of process parameters such as sequence of positive and negative form...

متن کامل

Properties of the Least Squares Temporal Di erence learning algorithm

This paper focuses on policy evaluation using the well-known Least Squares Temporal Di erences (LSTD) algorithm. We give several alternative ways of looking at the algorithm: the operator-theory approach via the Galerkin method, the statistical approach via instrumental variables as well as the limit of the TD iteration. Further, we give a geometric view of the algorithm as an oblique projectio...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2016

Incremental Truncated LSTD

نویسندگان

چکیده

منابع مشابه

Finite element simulation of two-point incremental forming of free-form parts

Incremental Least-Squares Temporal Difference Learning

iLSTD: Eligibility Traces and Convergence Analysis

Two point incremental forming of a complicated shape with negative and positive dies

Properties of the Least Squares Temporal Di erence learning algorithm

عنوان ژورنال:

اشتراک گذاری